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Abstract. The theory of zigzag persistence is a substantial extension of persistent homology, and its 
development has enabled the investigation of several unexplored avenues in the area of topological data 
analysis. In this paper, we discuss three applications of zigzag persistence: topological bootstrapping, 
parameter thresholding, and the comparison of witness complexes. 

1. Introduction 

The newly emerging area of topological data analysis attempts to use techniques from algebraic topology 
to study qualitative properties of datasets. Its applicability has been demonstrated in areas as diverse as 
object recognition, sensor networks, and bioinformatics Car09 . The need for a topological approach to data 
analysis tasks is justified by high-dimensional and highly nonlinear data sources which are not amenable to 
traditional statistical techniques. One of the key tools in this field is persistent homology, which produces a 
concise yet meaningful descriptor of an object or point cloud across all spatial scales. 

The standard pipeline for performing a persistent analysis of a point cloud consists of two stages: 

(1) A filtered simplicial complex is built which consists of nested topological spaces {S r }. This filtered 
complex can be regarded as encoding the mutual connectivity information between all points in the 
data set. In practice, since we are only interested in finite simplicial or cell complexes, we have a 
finite sequence 

So — > Si — > . . . — > S n 

where the arrows are inclusions. Standard methods for creating such complexes from geometric data 
include the Vietoris-Rips, Cech, a-shape and witness constructions. 

(2) The persistent homology of this filtered complex is computed. In other words, we fix a field, F, and 
apply the functor H p (— ,F) to the above diagram to obtain the following sequence of finite F-vector 
spaces 

H p (S ,F) ^HpOS^F) -> ...-m p (S„,F) 

The theory of persistent homology, |ZC05j , tells us that such a sequence (which we call a persistence 
module) can be classified up to isomorphism by a multi-set of intervals which we informally call a 
barcode. Long intervals indicate the presence of topological features across a wide range of spatial 
scales. 

The theory of zigzag persistence provides an extension of persistent homology to diagrams of topological 
spaces of the form: 

So <-> Si «-> . . . <-> S n 

where the arrows can point either left or right. In [CdSlOj the algebraic foundations are laid out for the 
theory of zigzag modules which are sequences of vector spaces and linear maps of the form: 

Vq <-> Vi <-> . . . <-> V n 

It is shown that such sequences have the structure of an abelian category and can be classified up to 
isomorphism by a multi-set of intervals {[dj, &j] C {0, . . . , n}}. In |CdSM09] . an algorithm is presented which 
allows for the computation of the interval decomposition of the zigzag module: 

R p (So) o R p (Si) o . . . o R p (S n ) 
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where Si are simplicial (or cell) complexes and the arrows are all of the form S — > SU{a} or SU{<r} <— S. In 
other words, all of the arrows correspond to the addition or deletion of simplices. The algorithm is essentially 
an extension of the persistent homology algorithm presented in [ZC05 . 

In this paper we explore the use of zigzag homology for studying the topological information contained in 
point cloud datasets. The following applications were presented (hypothetically) in [Car09| and will be the 
object of our study for the remainder of the paper. 

• Topological bootstrapping: Suppose that we are presented with a dataset X. We are interested 
in understanding the homology of the entire dataset from subsamples. It may be the case that X is 
very large, or that the points in X are accessed in an online manner through some sort of querying 
process. We would like to understand the homological properties of X through smaller samples 
Xq, . . . , X n . If we compute the persistent homology based on the individual samples, Xi and find 
nontrivial homology classes, we would like to know if these are repeated measurements of the same 
homology class or different ones. To investigate this we consider the union sequence of topological 
spaces: 

Xi —> Xi U Xi + \ <— Xi + i — > . . . 

If we can somehow determine the continuity of homology classes between the terms in the above 
sequence, that would tell us whether the samples are measuring the same homological features or 
not. 

• Thresholding: Imagine that in addition to our dataset, X, we are provided a parameterized filter 
function which we denote f{-,9) ■ X — > K. Given this function, we may consider filtering our dataset 
by selecting those points which are in the top T% ranked by /(•, 9). An example of this is where / is 
an empirical density estimator. Although there may be some statistical rules of thumb for selecting 
the appropriate parameter value, 9, we are interested in understanding how the dataset changes as 
9 varies. The trouble is that there may be no natural relationship between the top T% values for 
different parameters 9 and 9' . To remedy this, one may consider the sequence 

...«- X f [9i,T] -> X f [9 h T}UX f [9 i+u T] «- X f [9 i+1 ,T] 

or possibly 

...->X J [0i,T\<-X / [6 u T\nX J [0u.uT\->X f [0 i¥1 ,T\ <-... 

where Xt [9, T] denotes the top T% values ranked by f(-,6). Computing the homology of the above 
sequences reveals important information about how the the parameter affects the homological prop- 
erties of the dataset. 

• Witness complex comparison: There are numerous methods for modelling a discrete set of 
points, X, in a metric space by a simplicial complex. Examples include the Vietoris-Rips complex, 
Cech complex, and alpha-shape complex. All of these constructions produce a complex with vertex 
set equal to the original dataset in question. However, there is also a type of complex, namely the 
witness complex, that allows one to estimate the topological properties of a point cloud by only 
looking at a subset of the actual points |dSC04j . A subset Iclis designated as the landmark set, 
and its points are the vertices in the witness construction which we denote W(X; L). Points in the 
complement X \ L influence the construction of the witness complex, but do not appear in it. This 
has two main benefits over conventional simplicial complex constructions: 

(1) The witness complex produces approximations that are more resilient to random noise. 

(2) It reduces the computational burden of computing the persistent homology of a large data set. 
However, the question arises of how the persistent homology of the approximating witness complex 
relates to the persistent homology of the entire dataset. We attempt to address this problem in section 
|5] of this paper. While it is not possible to answer this question completely without computing the 
homology for the whole point cloud, we discuss a method for comparing different subsamples. As in 
the previous two applications, we construct a zigzag diagram of topological spaces 

W[X; U) <- W[X; L h L l+1 ) ->■ W(X; L l+1 ) <- W(X; L i+1 ,L i+2 ) ->• . . . 

Homological features that are stable across different landmark selections will manifest themselves as 
long barcodes in the interval decomposition of the homology of the above sequence. 
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2. Contributions 

The three examples found in the previous section are stated in jCdS lO] as potential applications of the 
theory of zigzag persistence. In this paper we discuss in detail the implementation of these ideas and their 
performance on actual datasets. To the best of our knowledge, this is the first such discussion of these 
applications other than in a hypothetical context, and the first implementation. The code for computing 
zigzag homology is included in the Javaplex software package |TVJAll| . The package is freely available for 
download at http : / / code . google . com/p/ j avaplex 

3. Topological Bootstrapping 

Suppose that we are presented with a dataset X. We are interested in understanding the homology of the 
entire dataset from samples, and how the samples relate to each other. The approach will be reminiscent of 
the bootstrap method in statistics (sec ET93 ), where one obtains additional information about a dataset 
by performing resampling. The picture to keep in mind is the following: 




We are interested to know whether homology classes of the samples are measuring the same homological 
feature (as on the left) or different features (as on the right). To evaluate the compatibility of two samples 
Xi and Xj from X we consider their union Xi U Xj. The Vietoris-Rips construction produces a filtered 
simplicial complex from a finite metric space as follows: The vertices of VR(Y, e) consist of the points of the 
metric space Y. An edge [u,v] is in VR(Y, e) if and only if d(u,v) < e. For n > 1, an n-simplex is in the 
complex if and only if all of its faces are. This complex has the property that: 

VRpQ, e) C YR(Xi U X 3 ,e) D YR(Xj,e) 

Thus we may consider the zigzag diagram of vector spaces by applying the functor H p (— ) to the above to 
obtain: 

H p (VR(Xi, e)) -> H P (VR(X, U^ £ ))f- H p (VR(X j; e)) 

The interval decomposition of this zigzag diagram tells us important information about the compatibil- 
ity of the homological features of the complex. If it happens that two classes [a] <E H P (VR(JQ, e)) and 
\0\ G H p (VR(X,-, e)) map to the same element in H p (VR(Xi U Xj,e)), this suggests that [a] and [/3] are 
measurements of the same p-dimensional homological feature. 

This idea can be extended to multiple samples {Xo, ■ ■ ■ X n } from X. This provides us with the diagram: 

. . . -> VR(X,_! Ul„f)f- VR(X U e) -> VR(X, U X i+l , e) «- VR(X t+1 , e) -> VR(X l+1 U X i+2 , e) «- . . . 

where the arrows are inclusions. 

We also note that this can be repeated with other "inclusion preserving" simplicial complex constructions. 
For example, among the parameterized lazy- witness complexes (see |dSC04j ). when i/ = 0we have that: 

LW (X 4 , 1,) C LW (X, U Xj,Li U L 3 ) 

However, as one can see in the paper [dSC04j the behavior of the Vietoris-Rips complex and the lazy-witness 
complex for v — are very similar, therefore there is not much benefit of one over the other. For v ^ the 
construction is not inclusion preserving. 
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3.1. Algorithms. Suppose that we have a point cloud equipped with a metric (in other words, a finite 
metric space) denoted by (X,d) and subsets Xo, . . . ,X n with Xi C X. We also suppose that we have an 
inclusion preserving filtered complex construction which takes in a finite metric space and outputs a filtered 
simplicial (or cell) complex, which we denote by F. We require F to be inclusion preserving (or functorial 
in an appropriate sense) so that X C Y =>■ F(X) C F(Y). Note that in practice we most likely use 
F = VR(— , e) to be the Vietoris-Rips complex. In this section we give two algorithms for computing the 
interval decomposition of the sequence 

. . . -> KpiFiXi.! U X,)) <- H p (F(Xi)) -> H p (F(A, U X i+1 )) <- R p (F(X l+1 )) -> H p (F(X i+1 U X i+2 )) <- . . . 

The first one relies on the incremental addition and removal algorithms presented in section 4 of |CdSM09] . 
These algorithms maintain a set of three matrices Z, B,C which essentially store a consistent basis for the 
vector spaces H p (Si). Let us denote by ADD(er, k, X) the algorithm that updates these matrices (the state) 
with the addition of the simplex a at the index k. The set of intervals is denoted by X, and the ADD 
routine returns a new copy of X possibly updated with a new interval. Similarly, denote by REMOVE(cr, 
k, X) the algorithm that updates the state with the removal of the simplex a at index k. Using these two 
as subroutines, the interval decomposition of the union zigzag sequence may be computed as follows. Note 
that we index the terms F(Xi) with i and the terms F(Xi U using % + i. 



Algorithm 1 Interval decomposition of union zigzag (version 1) 

for a in F(Xq) do 

Call I -f- ADD(cr, 0, X) 
end for 

for i = 0, . . . , n — 1 do 

for a in F(X l U X l+1 ) \ F(X,) do 

Call X <— ADD(c, 
end for 

for a in F(X l+1 ) \ F(Xi) do 

Call 1 <- REMOVE(cr, i + 1, X) 
end for 
end for 

Optionally remove intervals in X supported on half-integral indices. 



Despite the simplicity of the above algorithm, it is also possible to approach the task slightly differently. If 
we compute the persistent homology of F(Xj), F(Xj + i) and F(Xj U X,+i), we are interested in the induced 
action on homology on the inclusion maps i : Xj <—t Xj U Xj + \. However, the only two possibilities are 
that «* is the identity on a homology class [a] G H p (F(Xj)) or is the zero map on [a]. Thus since we are 
given the computed persistent homology of F (Xj UXj+i), we can "match" homology classes. Note that the 
persistent homology algorithm (as well as the algorithm in [CdSM09]) attempts to perform the computation 
incrementally by computing the persistent homology of K U {a} from K. However, in this situation the 
incremental approach is not necessary: We may compute the persistent homology of F(Xj), F(Xj + i) and 
F(Xj U Xj + i) and simply examine which classes are preserved or killed-off by the induced inclusion map. 
We present this algorithm below for the union sequence. The algorithm maintains a collection of intervals, 
which at the j-th stage of the algorithm is denoted by Xj . The completed interval decomposition is given by 
T 

There is one drawback to algorithm [2] in comparison with algorithm [T] Algorithm [2] must compute the 
persistent homology of the individual terms (such as F(Xj)) by calling the ADD routine. Thus, for the 
sequence F(Xj) -> F(Xj U X J+1 ) <- F(X j+1 ), it must call ADD a total of |F(X,-)| + \F(Xj U X j+1 )\ + 
\F(Xj+i)\ times. In comparison, algorithm [l] must call ADD \F(Xj)\ + \F(Xj U X j+1 ) \ F(Xj)\ times, and 
REMOVE \F(X j UX j+1 )\F(X j+ i)\ times. Thus in terms of the total number of ADD and REMOVE calls, 
algorithm [2] is suboptimal in comparison with algorithm [T] which performs the exact minimal number of 
ADD and REMOVE operations. However, the difference is that the second algorithm only requires the ADD 
routine. This has the advantage in that the data structures needed to support the REMOVE operation are 
more complex and require more overhead than those needed to support only the ADD operation. In our 
software implementation, we used algorithm [2] 
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Algorithm 2 Interval decomposition of union zigzag (version 2) 

Initialize X <— {I a = [0, oo] : a is an active homology class in H p (F(X ))} 
for j = 0, . . . , n — 1 do 

Xj <— Zj-i 

for [a] G H p (F(Xj)) do 
for [6] G Rp(F(X j+1 )) do 

if [a] and [b] map to the same homology class in H p (F(Xj U Xj +1 )) then 

Maintain the interval I a — [s a ,oo] corresponding to the homology class a in Xj 
Mark homology classes [a] and [b] as matched 
end if 
end for 
end for 

for [a] G H p (F(Xj)) such that [a] is unmatched do 

End the interval corresponding to the class [a] by changing I a = [s a , oo] to I a «— [s a , j] 
end for 

for [b] G H p (F(Xj + i)) such that [b] is unmatched do 

Start an interval corresponding to [b] by adding I\, = [j, oo] to Xj 
end for 
end for 

Close off all remaining intervals of the form I a = [s a , oo] by setting them to I <— [s a , n] 



Similarly, we may also use the ADD and REMOVE routines to compute the interval decomposition of 
the intersection sequence 

. . . «- Hp^AVi n Xi)) -> R p (F(Xi)) R p {F(X t n -> H p (F(X i+1 )) H p ( J F(X l+1 n X l+2 )) -> . . . 

just as easily. For example, the adaptation of algorithm [T] is shown below. Additionally, one may also adapt 
algorithm [2] to the intersection sequence trivially. 



Algorithm 3 Interval decomposition of intersection zigzag 

for a in F(X ) do 

Call X 4- ADD(cr, 0, X) 
end for 

for i = 0, . . . , n — 1 do 

for a in F(Xi) \ F{Xi n do 
Call I <- REMOVE(cr, i + |, J) 
end for 

for a in ,F(X l+1 ) \ PI X l+1 ) do 

CallX^ ADD(cr, i+ 1, X) 
end for 
end for 

Optionally remove intervals in X supported on half-integral indices. 



3.2. Basic Example. Let us verify our intuition on an example that is easy to verify. The following example 
shows 7 random samples from a dataset containing 10,000 points on a figure-8. Below, we show the Vietoris- 
Rips complexes constructed from each sample Xi for i = 0, . . . , 6. A maximum filtration value of 1.2 was 
used and each sample contains 40 points. 
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8 8 6 




If we assign indices X t <H> i, and X t U X i+1 — X ii+1 i + ~, and we suppress the homology classes in 
the intermediate complexes (the unions) , then the intervals tell us about the "transfer" of homology cycles 
between the different samples. For the above samples, when one uses the union sequence, the following 
intervals are obtained: 

Figure-8 (samples = 7, sample size = 40) (dimension 0) 



Figure-8 (samples = 7, sample size = 40) (dimension 1) 



The cardinality of the lines above a point on the horizontal axis indicates the rank of the corresponding 
homology group. For example, for the sample at index 0, we can see that the homology groups both have 
rank 1 in dimension and 1. At index 1, they have ranks 1 and 2 in dimensions and 1, respectively. 
Furthermore, we also see that the 1-dimensional cycle computed at index is compatible with one of the two 
measured at index 1, due to the continuity of the interval. Similarly, we can see that all of the 0-dimensional 
homology classes are the same (in other words, they represent the same connected component). In indices 
1-3, we can see that the two homology classes measured in dimension 1 are also pairwise compatible. 

We do not consider the intersection sequence with intermediate terms Xi n Xi + \ here since they intersect 
extremely sparsely for random samples. 

Similarly, the following figure shows the interval decomposition of the union sequence where the underlying 
space is a random sample of 10,000 points on the unit 2-sphere. 10 samples of 100 points were taken and a 
maximum filtration value of 1.0 was used for constructing the Vietoris-Rips complexes. Note that at indices 
3 and 6 the random subsample did not find a 2-dimensional homology class. However, for neighboring indices 
which did find a 2-dimensional class the barcode plot shows that these classes are compatible. 
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2 3 4 5 6 7 

2-Sphere (samples = 10, sample size = 100) (dimension 1) 



2 3 4 5 6 7 

2-Sphere (samples = 10, sample size = 100) (dimension 2) 



3.3. Incremental Samples. Using the Vietoris-Rips based sampling framework, we may also investigate 
the role of the sample size. To do this, we generate samples {^o, • • • ,X n } with \Xi\ = iVj. In the example 
below we perform this from a dataset which consists of 10,000 random points on a figure-8. Samples were 
generated of sizes 2, 3, 4, ... , 150, 151, 152. A maximum filtration value of 1.0 was used for the construction 
of the Vietoris-Rips complexes. The following figure shows the barcodes for the homology of this sequence. 



50 



100 



150 



Incremental samples from random points on a figure-8 (f m = 1 .000) (dimension 1 ) 



50 



100 



150 



These barcodes indicate that once the size of the sample is roughly above 100, the correct Betti num- 
bers {1,2} are computed from the samples, and furthermore these homology classes are compatible. It is 
important to note that the samples are not nested, but are independent uniformly random selections. 
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4. Parameterized Filtration 

Suppose that we have a dataset X and a parameterized filtration function f(-,6) : X M. An example 
could be a density estimator which is parameterized by some sort of width or variance parameter. One is 
interested in studying homologically how the filtration function behaves on the dataset for different parameter 
values. Let us define the set: 

Xf[6,T] — {x £ X\ x is among the top T% points ranked by the filtration function /(-,#)} 

We are interested in how the samples relate to each other as we change the parameter 9, thus we consider 
the sequence of samples for a sequence of parameters {#.;}: 

... <- X f [6i,T] -> X f [9 it T\ U X f [6 i+1 ,T] «- X f [9 i+1 ,T] -> . .. 

As done previously, we may apply an inclusion preserving filtered complex construction and compute its 
zigzag homology. For example, we may compute barcodes for: 

(1) ...^H p (VR(X / [^,T]))^H p (VR(X / [^,T]UX / [^ +1 ,T]))^H p (VR(X / [^ +1 ,T])) ->■ ... 

The existence of intervals of positive length suggest the preservation of homological features across several 
values of the parameter 8. Similarly, we may also consider the intersection sequence analogous to the above: 

(2) . . . -> H p ( VR( X f [9 t , T] ) ) <- H p ( VR( X f [B t , T] n X f [9 i+ 1 , T] ) ) -> H p ( VR( X f [9 l+ 1 , T] ) ) «- . . . 

We note that these sequences are no different than those in the previous section. For example, one may 
write Xi = Xf[9i,T] and then the union and intersection sequences are the same as those in section [3j thus 
the same algorithms apply. 

4.1. Image Patch Data Example. In CISZ06] and |Car09j . the authors discuss a topological analysis of 
a dataset consisting of high-contrast patches from a database of natural images. Using a density filtration, 
they conclude that a set of such patches has the topology of a Klein bottle. The dataset under consideration, 
called A4, consists of around 4.5 x 10 6 patches of 3 x 3 pixels appropriately normalized. From this a random 
sample of 5 x 10 4 points is selected which we call Mo- We refer the reader to |LPM03| for additional details 
on the source of the data and the preprocessing steps taken. 

In this situation one possible filter function is the A;-codensity function defined by: 

5(x, k) = S k {x) = d(x, Vk{x)) 

where Vk{x) denotes the fc-th nearest neighbor to the point x. We may think of Sk(x) as being inversely 
related to the density of X at x. The parameter k is a smoothing parameter - large values of k mean that 
the codensity is measured in a large region around x. As before, we define the sets: 

X$[k, T] = {x £ X\ x is among the T% densest points in X as ranked by the codensity 5k} 

Alternatively, we may use a continuously parameterized density estimator: 

1 - 

fix, a) = - S_\K a (x- Xi) 
n * — ' 

i=l 

where 

K a {z) = (V2W)- rf exp f-^T) 

is a circular Gaussian kernel. It turns out that the codensity function 5 and the kernel density function / 
both give the same qualitative conclusions. Our goal is to study the setsA4o t s[k, T] or A4 f[a, T] which for 
notational convenience we denote by Mo[k,T] and A4o[a,T] respectively. 

In |CISZ06j it is shown that when one filters by 5 and sets k = 300 one may recover the "primary circle" 
which can be thought of as containing edges of all orientations. The patches on this circle component can 
be schematically visualized as follows: 
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When using k = 15 one obtains the 3-circle model, where in addition to the primary circle there are two 
other secondary circles. However, in [CISZ06] and |Car09j these results are obtained by selecting a landmark 
set and then computing the persistent homology of a lazy-witness complex, which drastically reduces the 
computational burden. In this case we chose not to do that since we are interested in the compatibility 
between Vietoris-Rips samples. To illustrate the difficulty, for the image patch data we are interested in the 
case where T = 30%, so that we have a sample of 1.5 x 10 4 points. Constructing a Vietoris-Rips complex on 
a dataset of this size is well outside the capability of any known software package for computing persistent 
homology. To remedy this, we do the following. Given A4o[k,T] for a specified value of k and T, we choose 
a subsample of S points via a sequential max-min procedure. The Vietoris-Rips complex is then construct 
from this subsample of size S. In practice, S will be on the order of 100. This procedure is similar to 
computing a (lazy)-witness complex, except that we ensure that the construction is inclusion preserving. 

Taking k = 300, T = 30%, and S = 100 we observe the primary circle very clearly: 

Barcodes for VR(n50000Dct[k=300,T=1 5000], 1.100) (S = 100) (dimension 0) 




Barcodes for VR(n50000Dct[k=300,T=1 5000], 1.100) (S = 100) (dimension 1) 



0.2 



0.4 



0.6 



Taking k = 15, T = 30%, and S — 100 we can recover the 3-circle model, manifested by the fact that 
13, =5. 



10 



ANDREW TAUSZ AND GUNNAR CARLSSON 



Barcodes for VR(n50000Dct[k=15, T=15000], 1.100) (S = 100) (dimension 0) 




Barcodes for VR(n50000Dct[k=15, T=15000], 1.100) (S = 100) (dimension 1) 



0.2 



0.4 



0.6 



0.8 



Similarly, when filtering by / one obtains the three circle model for low values of o and the primary circle 
for higher values. For example, when a = 0.11, we observe: 

Barcodes for VR(n50000Dct[theta=0.1 10000, T=1 5000], 1.100) (S=100) (dimension 0) 




Barcodes for VR(n50000Dct[theta=0.1 10000, T=15000], 1.100) (S=100) (dimension 1) 



0.8 



1 
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Barcodes for VR(n50000Dct[theta=0.200000, T=15000], 1.100) (S=100) (dimension 0) 




Barcodes for VR(n50000Dct[theta=0.200000, T=15000], 1.100) (S=100) (dimension 1) 



0.2 



0.8 



To see the effect of the density parameter, we compute barcodes for the zigzag sequence ([I]) using the 
kernel density estimate /. In the figure below, this is done for a = 0.11, . . . , 0.2, T — 30%, and S = 100. 



It should be noted that the ticks on the horizontal axis indicate the index of the sample, and not the 
parameter value. From this figure, we can see that as the parameter a increases, the 1-cycles from the three 
circle model disappear and only the primary circle remains for larger values. 



5. Witness Complexes 



As stated in the introduction, the witness complex construction relies on the selection of a landmark set 
L <Z X. We are interested in determining how the homology of the witness complexes for different landmark 
selections relate to each other. 
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5.1. Definitions. In this section, we review some definitions for completeness. A more thorough discussion 
of witness complexes can be found in |dSC04j . 

Let X be a simplicial complex, and denote the fc-skeleton of X by Xk . We let L be a subset of the points 
of X, so L C Xq, and let d be a metric on the points X a . The set L is referred to as the landmark set. 

Definition 1. Suppose a is a simplex with vertices in L. A weak witness for a is a point, x, that satisfies 

d(x, u) < d(x, v) for all u G o~,v G L \ a 

In other words, if a is an rt-simplex, then a; is a weak witness for a if the (n + l)-nearest neighbors are 
the vertices of cr. 

Definition 2. The weak witness complex W(X; L) consists of simplicies for which there exists a weak witness 
in X , and such that all subsimplices have weak witnesses. 

For example, suppose that in the figure below we have that L = {A, B, C, D}. Then E is a (weak) witness 
for the edge [B, C] since the 2 nearest neighbors of E in L are B and C. The edge [B, D] is not in the weak 
witness complex since there is no point outside of L that witnesses B and D (in other words B and D are 
not the two nearest neighbors of any point outside L). 




It is important to note that in general the witness construction is not functorial in that if L C L', 
it is not necessarily true that W(X;L) c W(X;L'). If this were true, one would be able to compare 
different landmark selections by forming a zigzag complex of topological spaces with terms W(X; L) ^ 
W(X; LUL') ^ W(X;L'). Due to the failure of this property, we must use an alternate construction known 
as the witness bicomplex. For the proceeding definitions, suppose we have subsets L, M c Xq. 

Definition 3. x in X is a weak biwitness for the bisimplex (a, r) if x is a weak witness for a in L and a 
weak witness for t in M . 

Definition 4. The pair (a, r) is in the weak biwitness complex W{X\ L, M) if all subcells of (a, r) have a 
weak biwitness x G X. 

Note that elements of this complex are pairs (cr, r) with the vertices of a in L and the vertices of r in M. 
We can define the obvious projections 

px : W(X;L,M) W(X;L) 

p 2 : W(X; L, M) -> W(X; M) 

with pi : (cr, t) i— > a and pi : (a, r) i— > r. These are well defined, since the definitions imply that we must 
have that 

W(X; L, M) C W(X; L) x W(X; M) 

The above definitions allow us to consider the following construction. We piece together all of these maps 
to form what we define as the biwitness zigzag complex: 



W(X;L ) <- W(X;Lo,Li) -> W{X;L X ) <- W{X;L X ,L 2 ) -> 
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5.2. A Note on the Selection of the Landmark Points. In practice, if one is computing persistent 
homology using a witness complex, the first stage of the process is the selection of the landmark set. While 
it is possible that a practitioner may have some a priori idea of which points to select, there are two automated 
methods for generating landmark points. The first method is just to select \L\ points uniformly at random. 
The second is to perform a sequential max-min procedure. To do this, an initial point Lq = {lo} is selected. 
Subsequent points are selected by maximizing the minimum distance to the existing landmark set. In other 
words, 



^n+l 



arg maxjmin d(l, U 



k e L n } 



In the figure below, a randomized selection is shown on the left, and a max-min selection is shown on 
the right. Note that the max-min selection produces points that are more evenly spaced. We denote the 
landmark points in L by black cubes. 




■* ft fc«'VT , » " rf* »• .•• 




In this paper we only consider randomized landmark selection. For example, when we show the repeated 
sampling examples below, all of the selections are randomized. The reason for this is that the only source 
of randomness in the sequential max-min procedure is in the selction of the first point. When one computes 
two such landmark sets, very often they share most points with each other. For this reason, the sampling 
results presented later on would not be very interesting with max-min sampling. 

5.3. Algebraic Formulation. Let us analyze the algebraic situation of the biwitness zigzag complex. For 
more background information on related constructions, we refer the reader to |Wei95j or |Lan02] . Consider 
the more general zigzag diagram of differential graded vector spaces (equivalently chain complexes), which 
we call the projection diagram: 



A* 



B, 



where we have that C* is a summand of the tensor product {A®B) t = ® p+(J=>t A p (£)B q . We are interested in 
the action of the projection maps ir^ and on homology, where we define ir^ and ir„ to be the morphisms: 



where ir n is the (graded) endomorphism which is the identity on grade n, and zero elsewhere. 
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J 



It is straightforward to verify that the morphisms 7r* commute with the boundary operator. To see this, 
we use the fact that ir n trivially commutes with d and compute (in the case of it} - the case of 7rJ is similar) : 

j 

'cji-Kn-xida,) + (-1)1°* I A ft 7r n _ x (aj)) 

3 
j 

= d*n(52 c 3 a 3 ® Pi) 
i 

In the above, Xp denotes the sum of the coefficients in the chain 9/3. If we were to have that the second term 
in the second line would be non-zero, then we must have that the degree of ay must be n — 1 in order to not 
get sent to zero by ir n -i- Since \aj\ + \/3j\ — n we must have that \(3j \ — 1. However, in all cases of interest 
(for example simplicial or cellular homology) the sum of the coefficients of the boundary of 1-dimensional 
elements is always zero, so we always have that A^-7r„_ 1 (a J ) = 0. 

Thus, we have that Tt l n descends to a map on homology. As in section [3] it is possible to compute the 
interval decomposition of the projection diagram by looking at projections of homology classes. In section 
5.4 we use this observation to build an algorithm analogous to algorithm [2] 



5.4. Algorithm. In order to understand how the homology of different landmark samples relate to each 
other, we apply the functor H p (— ) to the witness bicomplex diagram, to get a projection diagram: 

R P (W(X; L )) <- fL P {W(X; L , L\)) -> R P (W{X; Li)) <- R P (W{X- L U L 2 )) -)•... 

Let us index the terms in the above diagram as follows: 

R p (W(X;L j ,L j+1 ))^j + ± 

Thus the regular witness complexes are at integral indices, whereas the bicomplexes are at indices of the 
form j + |. 

In this section we describe the algorithm for computing the interval decomposition of the witness projection 
diagram. The algorithm incrementally builds up the multi-set of intervals Xj, starting with Iq containing 
intervals of the form [0,oo] for each "active" generator of R p (W(X; Lq)). Note that this itself relies on the 
computation of the (persistent) homology of the individual witness complexes (see |ZC 05 ) . A homology class 
is said to be active if it corresponds to an infinite interval within the persistent homology H p (W(X; Lj)). 
We also note that the algorithm described below ignores those intervals that correspond to the biwitness 
objects within the witness bicomplex diagram. In other words, all of the intervals produces have integral 
endpoints. In the algorithm, we use the notation ~ to denote the equivalence relation that two cycles differ 
by a boundary. We are now ready to describe the procedure in algorithm [4] 

5.5. Interpretation. Suppose we find in the interval decomposition, an interval of the form [m, m + 1] 
where m is an integer. This means that there is a p-dimensional homology class that is present in the 
witness complexes at indices m and m + 1, and furthermore there is a homology class in V^(A"; L mi L m+ i) 
that projects to both of these. Similarly, an interval of the form [to, n] with m and n integers indicates the 
presence of a homology class that is mapped to by the intermediate bicomplexes for indices j — to, . . . ,n. In 
general, there will be also intervals with half-integer endpoints. These correspond to classes that are born or 
die at the bicomplexes. Depending on the application in mind, it may be useful to either keep these or discard 
these. The interval decomposition is a very parsimonious representation of the witness bicomplex zigzag. 
At any single integer point, the number of "active" intervals indicates the rank of the homology group at 
the given dimension of consideration. Continuity of intervals indicates the preservation of homology classes 
through a bicomplex. 
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Algorithm 4 Interval decomposition of bicomplex zigzag 

Initialize 1q <— {I a = [0, oo] : a is an active homology class in H p (W(X; A)))} 
for j = 1, . . . , n — 1 do 

j ^ — j — i 
for c £ Hp(W(Z;Lj,Lj + i)) do 

Ci ^ 7T^(c) 
C 2 <- nj( C ) 

if 3a e H p (M / (X; Lj)) and 6 e H p (VF(X; Lj+i)) such that a ~ ci and 6 ~ c 2 then 
Maintain the interval J a = [s a , oo] corresponding to the homology class a in Xj 
Mark homology classes a and b as matched 

end if 
end for 

for a E H p (W(X; Lj)) such that a is unmatched do 

End the interval corresponding to the class a by changing I a = [s a , oo] to I a 4— [s a , j] 
end for 

for b £ H p (W(X; Lj + i)) such that b is unmatched do 

Start an interval corresponding to b by adding = [j, oo] to Xj 
end for 
end for 

Close off all remaining intervals of the form I a — [s a , oo] by setting them to / «— [s a , n] 



5.6. Examples. 

5.6.1. A Synthetic Example. Let us verify our intuition on a simple example found in |dS07] . Nevertheless, 
this is an actual example of a witness bicomplex sequence, if we consider the points of X to be the points on 
the unit circle at angles {0, 2tt/3, Att/3}, and the points of Y to be at angles {7r/3, 7t, 5it/3}. The following 
figure shows the bicomplex Z C X x Y. 



(4,3) (2,3) 




(0,5) (0,1) 



Another way of thinking about this is X x Y is a torus cell complex, and Z traces out a loop on its 
1-skeleton. Let us consider 1-dimensional homology. The three elements of the sequence each have a single 
generator, which (using Z/2Z coefficients) are [0,4] + [0,2] + [2,4], ([0,4], [5]) + ([4], [3,5]) + ([0,2], [1]) + 
([2,4], [3]) + ([0], [1,5]) + ([2], [1,3]), and [1,3] + [1,5] + [3,5]. It is easy to see that the generator for the 
bicomplex projects to both homology classes at the endpoints. Thus in dimension 1, we have an interval 
[0, 1]. Similarly, in dimension we have generators [0], ([0], [1]), and [1], thus we also have an interval [0, 1] 
in dimension 0. 

5.6.2. Verification of Specifically Chosen Landmark Points. In the figure below, we show three landmark 
selections on a figure-8. Note that they were constructed so that C = AJJS. Computing the interval 
decomposition for the homology of the sequence 

W(X; A) i- W{X; A, B) ->• W{X; B) 
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we get the intervals {[0, 1]} in dimension 0, and {[0,0], [1, 1]} in dimension 1. In other words, the interval 
decomposition shows us that the homology classes in A and B in dimension 1 are not compatible. If we do 
the same thing for the pair A, C we get the intervals {[0, 1]} in dimension 0, and {[0, 1], [1, 1]} in dimension 
1, which shows us the compatibility of a 1-dimensional homology class between A and C and the presence 
of an isolated one in C. Similarly, we get the same result for the pair B, C. 

It is important to note that in these examples, we have not included those intervals that start or end at 
half-integer indices. The reason for this is that the witness bicomplexes can be quite complicated in general, 
and including such intervals would obscure the intervals of interest - that is the ones that indicate the mutual 
projection to homology classes in different integer indices. 




5.6.3. Long Term Behavior. In the previous two subsections, we showed examples with only two terms. The 
next example shows the intervals for 1000 random landmark samples each of size 20, drawn from a point 
cloud which consists of 1000 randomly selected points on the unit circle. From the figures below, we can see 
that there are long intervals in dimension 1, yet there is not one continuous interval. This is due to the fact 
that it is possible for two different landmark selections to produce 1-cycles that are not homologous within 
the bicomplex. This is expected since the sampling of both the points on the circle as well as the landmark 
points are done randomly. 
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Landmark samples from random points on circle (dimension 1) 



100 200 300 400 500 600 700 800 900 



We can repeat the above using randomly sampled points on a wedge sum of two circles. For the figure 
below, 41 samples of 40 points were selected from 1000 random points on a figurc-8. We only show the 
intervals in dimension 1, since in dimension there is just 1 long interval as in the circle example. It is 
interesting to see that although at any point there are two active intervals (corresponding to the two 1-cycles) , 
the samples do not always match up. 



Landmark samples from random points on a figure-8 (dimension 1 ) 



5 10 15 20 25 30 35 



Additionally if one is interested in doing a more detailed analysis of the compatibility between the two, it 
is possible to select a subset of the original set of samples which were included in "long" intervals. Through 
this process, a practitioner may amplify the homological signal obtained in certain samples. In the figure 
below, this was performed with the samples at indices 6, 7, 10, 15, 16, 17, 21, 22, 23, 24, 25, 26 and 27. 
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Reselected landmark samples from random points on a figure-8 (dimension 1 ) 



10 



12 



Another example is the 2- torus constructed by taking S 1 x S 1 C M 4 . We take 10,000 points uniformly 
at random, and construct 40 samples of 50 points. Furthermore, we only select those landmark choices for 
which the Betti numbers of the witness complexes are {1, 2, 1}. The interval plots for the zigzag homology 
of the bicomplex diagram are: 

Landmark samples from random points on a torus (dimension 1 ) 



10 



15 



20 



25 



30 



35 



40 
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5.6.4. Incremental Behavior. One interesting question to ask is how different size landmark selections relate 
to each other. Using the witness bicomplex framework, this is easy to investigate. We select a list of sizes 
{n 0} . . . , nk} and generate landmark selections {Li} each with m points. The next example shows this where 
the underlying dataset is a random sample of 200 points on the unit circle, and the sizes are {2, 3,4,..., 80} 

Incremental samples from random points on a circle (dimension 1) 



10 



20 



30 



40 



50 



60 



70 



80 



The reason for the long intervals among small landmark set sizes is that in order for an interval to continue 
between two integer indices (two witness complexes), we need there to be sufficiently many witness in the 
complement of the landmark selection that "connect" the two homology classes in the different selections. 
As the size of the landmark sets increase, this criteria (the existence of the appropriate witnesses) fails to be 
satisfied. Similarly, for the figure-8, using the same parameters we get the following: 
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Incremental samples from random points on a figure— 8 (dimension 1) 



10 20 30 40 50 60 70 



5.6.5. Pairwise Comparison. Up until this point, we have avoided talking about the ordering of the terms 
within the zigzag diagram. It is obvious that one may get different interval decompositions using a different 
rearrangement of the witness complexes. Instead, one may wonder whether it is possible to do a mutual 
comparison. To do this one would form a complete graph K n (if we suppose that we have n landmark 
selections {Li}). Vertex i would correspond to the witness complex W(X\Lj), and the edge would 
correspond to W(X;Li) W(X; L i: Lj) — > W(X;Lj). However, for important reasons the homology of 
this "complete" diagram is not algebraically tractable as in the linear case. The reasons for this are beyond 
the scope of this paper, but an interested reader may consult [CdSlOj . 

One can approach this instead by performing pairwise comparisons, rather than a mutual comparison. 
That is, suppose we have n landmark selections, we form all unordered pairs C {1, . . . , n} and compute 
the interval decomposition of the zigzag diagram involving only the selections Li and Lj at one time. 

For the example below, we perform this procedure as follows. For each distinct pair we compute 

the homology of the diagram W(X;Li) <— W(X; Li, Lj) —> W(X;Lj). Our underlying set is a 1000 point 
sample from a figure-8. For visual purposes, we construct a graph on n vertices where we connect the edge 
(i, j) if the interval decomposition is {[0, 1]} in dimension 1 and {[0, 1], [0, 1]} in dimension 1. The vertices 
correspond to different random landmark selections. For the figure below, we used 41 samples each of size 
40. 
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Connectivity graph for points in figure— 8 
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Although the above graph is too dense to have much visual meaning, it contains all of the possible 
compatibility information we wish to know between each sample. 



5.6.6. Image Patch Data. In this section we consider a much more realistic example, namely the image patch 
dataset considered in [CISZ06J and LPM03 . As discussed in section 



4.1 



the set Ain consists of 5 x 10 



high-contrast patches taken from a database of natural images. In [CISZ06 and |Car09| . using witness 
complexes the authors extract both the primary circle and secondary circle models. Using the above witness 
complex comparison framework, it is possible to investigate the role of the landmark sets. 
We before, we use the fc-codensity function: 



5k(x) = d(x, v k (x)) 



where Vk(x) is the fc-th nearest neighbor of the point x. The interpretation is that Sk is inversely proportional 
to the density of the dataset. We may perform the witness complex sampling procedure to verify that there 
is indeed one primary circle in .Mo,* [300, 30%]. We take 50 samples each of size 30, and get the following 
intervals: 
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Image patch dataset: 50 samples of size 30 (dimension 1) 



Note that there are certain samples in which there is no 1-dimensional homology class and there are 
samples in which there are more than one. However, if we force our samples to produce a 1-cycle (by 
rejecting those samples that do not have exactly one), we get the following set of intervals: 

Image patch dataset: 50 samples of size 30 (dimension 0) 



Image patch dataset: 50 samples of size 30 (dimension 1) 



The above figure tells us that all of the samples that measure a 1-dimensional cycle are compatible, 
supporting the conclusion in [A"C09 that there is one primary circle at this density sampling. At this point 
the reader may wonder why the above result is so clean, yet in previous synthetic examples (for example the 
first one in section 5.6.3) there are some points of discontinuity. There are two reasons for this. First, the 
above set of intervals was constructed by selecting only those samples in which a 1-cycle appeared. Second, 
if we consider the first example of section |5.6.3| then we can see that out of 1000 samples there were only 9 
points of discontinuity, meaning that they are relatively rare. 
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6. Concluding Remarks 

In this paper we have demonstrated the use of zigzag persistent homology in three applications: topolog- 
ical bootstrapping, the comparison of thresholding parameters, and the comparison of landmark selections 
for witness complexes. Zigzag persistence allows one to obtain a multi-set of intervals which indicate the 
preservation of homological features across different samples, levelsets or landmark selections. Long intervals 
indicate the stability of such features across many terms in the relevant sequence. Additionally, we have 
shown the use of these techniques in several computational examples and have demonstrated that these 
methodologies reveal important information about nonlinear datasets. 
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